A bile acid-related prognostic signature in hepatocellular carcinoma

Due to the high mortality of hepatocellular carcinoma (HCC), its prognostic models are urgently needed. Bile acid (BA) metabolic disturbance participates in hepatocarcinogenesis. We aim to develop a BA-related gene signature for HCC patients. Research data of HCC were obtained from The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) online databases. After least absolute shrinkage and selection operator (LASSO) regression analysis, we developed a BA-related prognostic signature in TCGA cohort based on differentially expressed prognostic BA-related genes. Then, the predictive performance of the signature was evaluated and verified in TCGA and ICGC cohort respectively. We obtained the risk score of each HCC patient according to the model. The differences of immune status and drug sensitivity were compared in patients that were stratified based on risk score. The protein and mRNA levels of the modeling genes were validated in the Human Protein Atlas database and our cell lines, respectively. In TCGA cohort, we selected 4 BA-related genes to construct the first BA-related prognostic signature. The risk signature exhibited good discrimination and predictive ability, which was verified in ICGC cohort. Patients were classified into high- and low-risk groups according to their median scores. The occurrence of death increased with increasing risk score. Low-risk patients owned favorable overall survival. High-risk patients possessed high immune checkpoint expression and low IC50 values for sorafenib, cisplatin and doxorubicin. Real-time quantitative PCR and immunohistochemical results validate expression of modeling genes in the signature. We constructed the first BA-related gene signature, which might help to identify HCC patients with poor prognosis and guide individualized treatment.


Materials and methods
Data acquisition. Research data of HCC were obtained from TCGA (https:// portal. gdc. cancer. gov) and ICGC (https:// dcc. icgc. org/) online databases. In TCGA database, we obtained sequencing data of 374 HCC samples and 50 normal samples. After matching with clinical data, 370 HCC patients were included. In ICGC database, the transcriptome data and corresponding clinicopathological data of 232 HCC patients were obtained. BAGs were collected from the Gene Set Enrichment Analysis (GSEA) database (http:// www. gsea-msigdb. org/ gsea/ index. jsp) using "bile acid" as the main search term, and 23 BA-related gene sets were found. These gene sets were comprised of genes that participate in BA biosynthesis, secretion, metabolism, and BA-related signaling pathways. After removing the overlapping genes, 199 BAGs were obtained. TCGA expression matrix did not contain the expression of MIR6886. Therefore, 198 BAGs were collected for further analysis and are displayed in Table S1. Figure S1 display the analysis flow of this study.

Identification of differentially expressed prognostic BAGs.
In TCGA cohort, we used p < 0.05, |log2 fold change (FC)|> 1 as standard to differentiate the differentially expressed BAGs between the normal group and the HCC group. The differentially expressed BAGs were displayed with a heatmap using "pheatmap" package in R4.1.2. Univariate Cox analysis was conducted to screen significantly prognostic BAGs after the expression data for the BAGs were combined with TCGA survival data. Subsequently, we intersected the differentially expressed BAGs with the prognostic BAGs to identify the differentially expressed prognostic BAGs for further analysis.

Construction and validation of the gene signature. Based on the differentially expressed prognostic
BAGs, we developed a risk model by using least absolute shrinkage and selection operator (LASSO) regression analysis. After LASSO regression analysis, 4 differentially expressed prognostic BAGs and its regression coefficients were obtained and were chosen for model building. Then, patient risk score was calculated with the following equation: Risk score = (Expression value of BAG1) * Coefficient 1 + (Expression value of BAG2) * Coefficient 2 + … + (Expression value of BAGn) * Coefficient n. According to the median risk score, we divided HCC cases into high-and low-risk groups. The discrimination ability of the signature was evaluated by principal component analysis (PCA). We utilized the area under the curve (AUC) to evaluate the prediction ability of gene signature. Tumor mutation burden and drug sensitivity analysis. Tumor mutation burden (TMB) scores were calculated and gene mutations were visualized via the "Maftools" package, which is easy to use and contains multiple statistical and computational approaches for cancer genome research 16 . According to patient gene expression data, the clinical chemotherapeutic response of patient can be accurately predicted by using the "pRRophetic" package 17 . The semi-inhibitory concentration (IC50) is a useful indicator of drug sensitivity and refers to the drug concentration required for 50% cell proliferation suppression in vitro. Therefore, a high IC50 indicates a low sensitivity of neoplastic cells to the drug. We predicted the drug sensitivities of HCC patients by using the pRRophetic package.
Cell culture and real-time quantitative PCR analysis (RT-qPCR). Human  Identification and functional enrichment analysis of the differentially expressed BAGs. Before constructing the prognostic signature, we screened the differentially expressed BAGs and analyzed the potential roles of these genes in hepatocarcinogenesis. 198 BAGs obtained from 23 BA-related gene sets were included in the differentially expressed analysis of TCGA cohort. We identified 44 genes as differentially expressed BAGs between the normal and tumor groups (Table S3). As shown in Fig. 1A, most of the differentially expressed BAGs were upregulated in the HCC samples. To explore the potential mechanism by which these genes contribute to HCC, GO and KEGG analyses were carried out. According to biological process (BP) analysis, Steroid metabolism, BA metabolism, BA and bile salt transport were significantly enriched. As indicated by cellular component (CC) analysis, these differentially expressed BAGs were significantly concentrated in peroxisomes and microbodies. Lipid, BA and monocarboxylic acid transmembrane transporter activity were significantly enriched categories according to GO molecular function (MF) analysis (Fig. 1B). KEGG (https:// www. kegg. jp/), an integrated database, establishes the connection between genomic information and higher order functional information, which contributes to the functional annotation of genes and proteins 18,19 . KEGG analysis was carried out to explore the pathway enrichment of differentially expressed BAGs, which is useful for understanding the functions of the genes. As indicated by KEGG analysis, the pathways involved with primary BA biosynthesis and bile secretion were enriched (Fig. 1C).
Construction of the BA-related risk signature in TCGA cohort. In order to constructed a BA-related risk signature in TCGA cohort, the prognostic BAGs were first screened by univariate Cox analysis from 198 BAGs with the criteria of P < 0.05. We identified 61 genes as significantly prognostic BAGs, and these genes are displayed in Table S4. Subsequently, 17 differentially expressed prognostic BAGs were obtained by intersecting the 44 differentially expressed BAGs with the 61 prognostic BAGs ( Fig. 2A). We subjected the common differentially expressed genes (DEGs) that we have obtained from intersection of differentially expressed BAGs with the prognostic BAGs to feature selection-LASSO regression method to select only those genes which are very relevant to formulate a bile acid-related prognostic signature ( Fig. 2B and C). Finally, 4 BAGs were selected for model building. The 4 BAGs and its regression coefficients are displayed in Table S5. The correlation network indicated that AKR1D1 had a negative relationship with other modeling genes (Fig. S2). As shown in Figs. 2D and S3A-3D, AKR1D1 was protective factor for overall survival, while the others were risk factors based on the univariate Cox analysis. In addition, AKR1D1 was upregulated, while the others were downregulated in the normal samples ( Fig. S4A-4D). Based on the 4 modeling genes and its regression coefficients, the equation for calculating the patient risk score was obtained and was as follows: Risk score = (Expression value of AKR1D1)*(− 0.0045) + (Expression value of NPC1) * 0.2253 + (Expression value of FABP6) * 0.1733 + (Expression value of MAPK3) * 0.0230. In TCGA cohort, patients were classified into high-and low-risk groups according to their median scores (Fig. 2E). The occurrence of death tended to increase with increasing risk score, which was further confirmed by survival analysis (Fig. 2F, G). The two groups were different in the expression of the www.nature.com/scientificreports/ modeling genes (Fig. 2H). The risk signature exhibited good discriminability and predictive ability ( Fig. 2I and J).
Validation of the BA-related risk signature in ICGC cohort. The model possessed good predictive performance in TCGA cohort. We then verified the predictive performance of the signature in external validation cohort. In ICGC cohort, we obtained patient risk scores with the above equation. Then, high-and low-risk patients were distinguished with the same median scores (Fig. 3A). Figure 3B shows an increase in mortality with increasing risk score. Kaplan-Meier (KM) plot further confirmed that high risk value hinted unfavorable overall survival (Fig. 3C). In high-risk patients, increased levels of the modeling genes were observed except AKR1D1 (Fig. 3D). The signature also displayed good differentiating capacity and predictive power in the external validation cohort ( Fig. 3E and F).
Independent predicting power of BA-related model. We subsequently explored whether the risk score was independent prognostic factor for HCC patients. After merging risk score with clinicopathological characteristics, independent prognostic factors were identified with Cox regression analysis. The prognosis of TCGA and ICGC patients was impacted by risk score and tumor stage, which revealed by univariate Cox regres- www.nature.com/scientificreports/ sion analysis in Fig. 4A and C. Risk score manifested independent predicting power for unfavorable survival, which corroborated by multivariate Cox regression analysis in Fig. 4B and D.
Risk score-related clinicopathological features. In order to understand the correlation between the model and clinicopathological features, we explored the discrepancy in risk score among patients with different clinicopathological features in TCGA and ICGC cohorts. We discovered risk score increased with increasing HCC severity. In TCGA cohorts, patients grading severity aligned with risk score (Fig. 5A). High risk scores were observed in patients with stage III-IV ( Fig. 5B and C). Other clinicopathological parameters, including age, gender, prior malignancy and cancer history, did not correlate significantly with the risk score ( Fig. S5A-5F).
Functional enrichment analyses based on risk score. According to the previous results, patient outcome varies with risk score. Whereafter, we conducted functional enrichment analyses to explore the possible mechanisms that contributed to this phenomenon. Based on risk grouping, we screened 219 genes as differential genes with the criteria of P < 0.05 and |log2FC|> 1 (Table S6). Subsequently, we conducted GO and KEGG enrichment analyses with clusterProfiler package. According to BP analysis, organic acid and steroid metabolism was significantly enriched. As indicated by CC analysis, these genes were significantly centralised in lipoprotein particles and protein-lipid complexes. In MF analysis, oxidoreductase activity and arachidonic acid monooxygenase activity were enriched (Fig. 6A). According to KEGG analysis, the pathways linked with complement and coagulation cascade, chemical carcinogenesis and bile secretion were concentrated (Fig. 6B). The pathways correlated with cell cycle and tumorigenesis were upregulated while the pathways correlated with drug metabolism and primary BA biosynthesis were downregulated in high-risk patients, as revealed by the GSEA (Fig. 6C).
Risk score-related immune status. The immune microenvironment plays an essential role in hepatocarcinogenesis, therefore, we conducted ssGSEA to evaluate the immune status. In high-risk patients, the proportions of activated dendritic cells (aDCs), macrophages and Tregs were increased, while the infiltration of natural www.nature.com/scientificreports/ killer (NK) cells exhibited the opposite trend (Fig. 7A). In low-risk patients, high scores of cytolytic activity and IFN response were observed (Fig. 7B). Patients with high immune checkpoint expression benefit more from immune checkpoint inhibitor therapy. Therefore, we evaluated the immune checkpoint expression in HCC patients. In our study, High-risk patients owned high immune checkpoint expression (Fig. 7C-H).
Differences in TMB and drug sensitivity based on risk grouping. TMB reflects the quantity of the somatic mutation in tumor tissues, which has a close relationship with response to immunotherapy and is considered as a promising immune-response biomarker 20,21 . Therefore, we assessed the TMB in HCC patients with different risk scores. Top 20 mutated genes were visualized by waterfall plots (Fig. 8A and B). TP53 is the most mutated gene in high-risk group. The mutation frequency of TP53 in high-risk group is 42% and the main mutation type is missense mutation. In low-risk group, CTNNB1 is the most frequently mutated gene and its mutation type is predominantly missense mutation. High-risk patients tend to have higher TMB (Fig. 8C). Drug www.nature.com/scientificreports/ resistance not only influences the therapeutic effect but also shortens patient survival. The prediction of drug sensitivity is a vital component in individual treatment. Thus, we evaluated the difference in drug sensitivity based on risk grouping. In high-risk patients, we found significantly low IC50 values for sorafenib, cisplatin and doxorubicin (Fig. 8D-F), which meant these drugs were more effective for these patients. The high-and low-risk patients had same sensitivity to mitomycin (Fig. S6).

Validation of the modeling genes.
According to the previous analyses, modeling genes are expressed differently in normal and tumor samples. We subsequently validated the protein and mRNA levels of the modeling genes in the Human Protein Atlas database and our cell lines, respectively. Compared with hepatocyte cell line LO2, HCC cell line HepG2 had higher mRNA expression levels of NPC1, FABP6 and MAPK3, which was  www.nature.com/scientificreports/ consistent with the bioinformatic analysis results (Fig. 9A-C). Elevated mRNA levels of AKR1D1 were verified in LO2 cells (Fig. 9D). In the Human Protein Atlas database, we verified the protein expression of 4 modeling genes. High levels of MAPK3 protein expression were observed in HCC samples, while AKR1D1 protein expressions were upregulated in normal tissues. FABP6 was not detected in normal or HCC samples (Fig. S7A-D). www.nature.com/scientificreports/

Discussion
The pathogenesis of HCC is extremely complicated and not fully understood. BAGs not only regulate the synthesis and metabolism of BAs but also play important roles in hepatocarcinogenesis. At present, most research focuses on the effect of a single BA gene on the biological behavior and prognosis of HCC. Considering the multiple impacts of BAGs on hepatocarcinogenesis, we constructed the first prognostic signature of BAGs to distinguish high-risk patients and guide personalized treatment. 44 differentially expressed BAGs were identified between normal and HCC samples based on 198 BAGs collected from 23 BA-related gene sets. Unsurprisingly, these genes were significantly concentrated in BA synthesis metabolism, which confirmed by functional enrichment analyses. Then the first BA-related prognostic signature was constructed based on 4 differentially expressed prognostic BAGs after a systematic analysis. Subsequently, we calculated patient risk scores with this signature. The risk signature exhibited good discriminability and predictive ability according to PCA, receiver operating characteristic (ROC) and KM curve analysis. Risk score manifested independent predicting power for unfavorable survival. In addition, we validated not only the predictive value and stability of genetic model but also the expression levels of modeling genes. www.nature.com/scientificreports/ The model we constructed contained 4 differentially expressed prognostic BAGs, namely, NPC1, FABP6, MAPK3 and AKR1D1. NPC1 encodes the membrane protein Nieman-Pick C1 (NPC1), which is critical for cholesterol export from lysosomes/late endosomes 22 . NPC1 mutation results in a life-limiting lysosomal storage disease, Niemann-Pick disease type C, and increases the risk of hepatocarcinogenesis 23,24 . FABP6 is a BA transporter in ileal epithelial cells and is critical for the enterohepatic circulation of BAs 25 . FABP6 also participates in the progression of numerous types of cancer [26][27][28] . Compared with adjacent normal liver tissues, overexpressed FABP6 was observed in HCC tissues 29 . Consistent with this, we obtained similar results in our study. However, the specific mechanisms of FABP6 in HCC need to be further studied. MAPK3, also known as extracellular signalregulated kinase 1 (ERK1), is a member of the MAPK signaling pathway, which participates in tumorigenesis and metastasis in multiple tumor types 30 . ERK1, together with ERK2, plays an important role in BA metabolism. www.nature.com/scientificreports/ BAG expression profile was altered in ERK1/2 knockout mice. ERK1/2 knockout decreased the expression of BA uptake genes and increased the expression of BA export gene 31 . CYP7A1 serves as a rate-limiting enzyme to participate in BA biosynthesis 32 . In human hepatocytes, ERK1/2 negatively regulated the expression of CYP7A1 and fibroblast growth factor 19 inhibited CYP7A1 expression partially through activation of ERK1/2 33 . In the pathogenesis of HCC, ERK1, but not ERK2, phosphorylated intestine-specific homeobox, resulting to its nuclear translocation and the expression of downstream genes related to cell proliferation, malignant transformation and the resistance to sorafenib 34 . The expression and activation of MAPK3 were upregulated in HCC tissues and cells 35,36 . In line with this, our results indicated that MAPK3 were highly expressed in HCC. In addition, MAPK3 was risk factors for overall survival. Delta(4)-3-Ketosteroid 5-Beta-Reductase encoded by AKR1D1 acts as one of the key reductases related to BA biosynthesis 37 . Aberrant expression of AKR1D1 contributes to BA synthesis defect, metabolic disorders and liver failure [38][39][40] . In human hepatoma cells, AKR1D1 regulated glucocorticoid clearance and the activation of the glucocorticoid receptor 41 . Overexpression of AKR1D1 significantly inhibited the cell viability and the activation of androgen receptor signaling pathway in HCC cell 42 . Low expression of AKR1D1 was observed in HCC tissue 43 . Here, HCC patients with elevated AKR1D1 had favorable prognosis, indicating that AKR1D1 might have an antitumor effect in HCC. AKR1D1 also had a good diagnostic ability for HCC 42 . Our model might be useful to guide the diagnosis of HCC and the genes included in our model might serve as promising diagnostic biomarkers for HCC. Further studies are warranted. Tumor immune microenvironment influences hepatocarcinogenesis and treatment strategies for patients 44 . BAs not only play an essential role in intestinal lipid absorption but also modulate immunity 6,45 . Therefore, we conducted ssGSEA to evaluate the immune status. In high-risk patients, the proportions of aDCs, macrophages and Tregs were increased, while the infiltration of NK cells exhibited the opposite trend. In low-risk patients, high score of IFN response was observed. In HCC patients, Treg intratumoral accumulation triggered by intratumoral macrophages suppresses tissue-derived CD4+ CD25− T cells activation, which contributes to HCC progression and unfavorable prognosis 46 . In addition, Tregs inhibit CD8+ T cells proliferation and activation in HCC patients 47 . NK cells form nearly half of the liver's lymphocytes. In HCC microenvironment, a high abundance of NK cells in HCC tissue is a favorable factor for survival 48 . Via a STAT3-dependent pathway, IFNs induce HCC cell apoptosis through blocking β-catenin signaling pathway 49 . Impaired antitumor immunity may contribute to adverse prognosis in high-risk patients.
Immune checkpoint inhibitors (ICIs) are new and effective treatment choices for HCC patients. However, as with other drugs, not all patients are similarly sensitive to ICIs. Distinguishing the ICI-sensitive population is important to individualized therapy. The interaction of immune checkpoint proteins with their ligands leads to T cells inactivation and immunosuppression. As monoclonal antibodies, ICIs can block the process by binding with immune checkpoint proteins, thus restoring T cell activation and immune attack 50 . ICIs might have better effects on high-risk patients who have upregulated expression levels of common immune checkpoint. TMB is a significant indicator of genomic stability 51 . Tumor with high TMB can generate more neoantigens, some of which can serve as signals to induce immune system activation and T-cell reactivity, thus increasing sensitivity www.nature.com/scientificreports/ of tumor cells to immunotherapy 52,53 . Therefore, TMB is an emerging biomarker for predicting ICIs response, and ICIs possess better effects in patients with a high TMB 54 . In our study, high TMB was discovered in high-risk patients. Chemotherapy is an important component in comprehensive treatment of advanced HCC. Predicting chemotherapeutic drug sensitivity is of great significance to individualized therapy. The gene expression may act as a surrogate for unmeasured phenotypes that are directly relevant to chemotherapeutic sensitivity, which provides the possibility to predict chemotherapeutic sensitivity based on gene expression 55 . Many researchers use gene expression data to predict drug sensitivity, including chemotherapeutic agents 56,57 . In our study, we evaluated the sensitivity to some chemotherapeutic and targeted drugs in HCC patients. Our results revealed that low IC50 values for sorafenib, cisplatin and doxorubicin were observed in high-risk patients, which indicated that high-risk patients responded better to sorafenib, cisplatin and doxorubicin. Therefore, our gene signature provides the basis for the individualized application of ICIs and contributes to guide individualized chemotherapy and targeted therapy.
Undeniably, there are some drawbacks in our study. First, the exact mechanisms of modeling genes in hepatocarcinogenesis have not been explored, and thus, further studies are needed. Second, the risk signature was constructed and verified based on public databases. The 95% confidence intervals of the risk scores were somewhat wide in the Cox regression analyses, and the AUC for the ICGC cohort was larger than that for the TCGA cohort, which might be partly attributed to the divergent risk factors and pathogenesis of HCC in different regions. Therefore, a larger, multicenter cohort is demanded to further test the predictive capability of risk signature.

Conclusions
In conclusion, we constructed the first BA-related gene signature, which presented a promising prediction performance and may be useful for making reasonable clinical decisions for HCC patients.